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Nucleotide sequences between the spike (S) and membrane (M) protein genes of the OC43 strain of human corona¬ 
virus were obtained from PCR-amplified viral mRNAs. Sequence analysis of this region revealed the presence of two 
ORFs encoding proteins of 12.9 and 9.5 kDa. These two proteins were identified as putatively nonstructural (ns) due to 
their homology to the corresponding BCV ns gene products. Northern blot analysis indicated that each of these two 
genes was present on a separate mRNA (5 and 5-1, respectively). In vitro translation analyses demonstrated that the 
HCV-OC43 9.5-kDa protein, like its BCV counterpart, is poorly translated when situated downstream of the 12.9-kDa 
ORF, although immunofluorescence studies did confirm its presence in infected cells. Sequence analysis showed that 
a large portion of the 3'-end of the leader sequence is present within the viral genome, upstream of the 12.9-kDa ORF. 
In addition, two ORFs encoding potential 4.9- and 4.8-kDa ns proteins in BCV are absent in HCV-OC43, although a 
corresponding mRNA 4 was found at a very low level. These results demonstrate that these two putative ns proteins 
are not essential for virus replication, at least in HRT-18 cells. © 1993 Academic Press, inc. 


Human coronaviruses (HCV) are recognized as the 
causative agents of respiratory diseases: they are re¬ 
sponsible for 15-35% of common colds (7). Other dis¬ 
ease associations have been suggested, such as the 
involvement of HCV in severe diarrhea (2) or neurologic 
diseases such as multiple sclerosis {3-5). Recently, 
some primates were shown to develop neurologic dis¬ 
ease after inoculation with a murine-related corona¬ 
virus (6). Indeed, some strains of murine hepatitis virus 
(MHV) in rodents have been used as model systems to 
study chronic and acute hepatic and neurologic dis¬ 
eases (7). 

Coronaviruses contain a single-stranded, capped, 
and polyadenylated positive-sense RNA molecule of 
27 to 31 kb, which directs the synthesis, by leader- 
primed transcription, of a nested set of six to eight sub- 
genomic mRNAs with common 3'-ends but extending 
to different lengths toward the 5'-end of the genome. 
Each mRNA possesses a common 5'-end leader of 
about 72 nucleotides, derived from the 5'-end of geno¬ 
mic RNA. Only the 5'-most open reading frame (ORF) of 
each mRNA is usually translated, although these 
mRNAs contain multiple ORFs. The intergenic regions 
between ORFs contain a stretch of sequence varying 
from 7 to 18 nucleotides that is homologous to the 
3'~end of the leader RNA. This region is presumably 

' Sequence data from this article have been deposited with the 
GenBank Data Library under Accession No. M99576. 

2 To whom reprint requests should be addressed. 


involved in regulation of the transcription of viral 
mRNAs ( 8 ). 

Along with the genes for structural proteins (HE, S, 
M, and N), coronavirus genomes contain a number of 
ORFs potentially encoding nonstructural proteins. The 
number and positions of these genes differ among co¬ 
ronavirus species. In avian infectious bronchitis virus 
(IBV), there are five ORFs, three located between the S 
and M genes and two located between the M and N 
genes {9, 10). In porcine transmissible gastroenteritis 
virus (TGEV), there are four ORFs, three located be¬ 
tween the S and M genes and one located at the 3' side 
of the N gene [11-13). In MHV, there are five ORFs, 
three located between the S and M genes ( 14-16), one 
at the 5'-side of the S gene {17, 18) and one located 
within the N gene [19). In bovine coronavirus (BCV), 
there are six ORFs, four located between the S and M 
genes {20), one at the 5' side of the HE gene [21) and 
one located within the N gene {22). 

HCV-OC43 and BCV show remarkable similarity, as 
shown by oligonucleotide fingerprinting analysis of 
their genomic RNAs [23, 24), by immunoprecipitation 
of viral proteins with specific antisera {25) and from the 
sequences of the N, M, and HE genes of HCV-OC43 
[26-28). These studies suggest that HCV-OC43 and 
BCV may have diverged fairly recently. 

In this report, we present the genome sequence be¬ 
tween the S and M genes of HCV-OC43. Only two 
ORFs, encoding potential proteins of 12.9 kDa (mRNA 
5) and 9.5 kDa (mRNA 5-1), were found, revealing a 
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major difference between putative nonstructural pro¬ 
teins of BCV and HCV-OC43. We also found about 
47 nucleotides of the leader sequence within the geno¬ 
mic RNA. 

The source and cultivation of the HRT-18 human 
rectal tumor cells and the OC43 strain of HCV, as well 
as the preparation, reverse transcription, polymerase 
chain reaction (PCR) amplification, cloning, and se¬ 
quencing of viral RNA (both mRNA and genomic RNA) 
were performed as described elsewhere {27). For 
Northern blot analysis, poly{A)-containing RNA was se¬ 
lected with the PolyATract mRNA isolation system 
(Promega, Fisher Scientific, Montreal, Quebec, Can¬ 
ada) according to the manufacturer’s instructions, was 
size fractionated by electrophoresis in 1% (w/v) aga¬ 
rose gels containing 5.3% formaldehyde, and was 
transferred onto Plybond-C extra (Amersham Canada 
Ltd., Oakville, Ontario, Canada) nitrocellulose filters 
{29). Blots were hybridized with the random-primed 
32 P-labeled (ICN Biomedicals Canada Ltd., Missis¬ 
sauga, Ontario, Canada) DNA probes at 42° as de¬ 
scribed previously {30). 

The nucleotide sequence of the region between the 
S and M genes of HCV-OC43 and the predicted amino 
acid sequences of open reading frames are shown in 
Fig. 1, together with the potential glycosylation site and 
intergenic consensus sequences. This region contains 
ORFs for two proteins of 12.9 and 9.5 kDa, in that 
order. An 11 -amino acid ORF with the potential to en¬ 
code a very small polypeptide of 1.33 kDa is also ob¬ 
served upstream of the 12.9-kDa ORF. 

The 12.9-kDa ORF (nucleotides 132 to 458) poten¬ 
tially encodes a 109-amino acid protein. This protein 
has an amino acid sequence identity of 96.3% with the 
12.7-kDa protein of BCV {20). As in BCV, there is one 
potential N linked glycosylation site at amino acid posi¬ 
tion 18 (Fig. 1). The single putative initiation codon for 
the translation of this protein is in a context not fre¬ 
quently used for initiation of protein synthesis {31; Ta¬ 
ble 1). The UCAAAAC consensus intergenic sequence 
is present about 107 nucleotides upstream of the initia¬ 
tion codon, within the 3'-terminus of the S gene (Fig. 1). 

The 9.5-kDa ORF begins at nucleotide 448 and ends 
at nucleotide 699. It predicts an 84-amino acid protein. 
This protein has amino acid sequence identities of 
96.4% with the 9.5-kDa protein of BCV {20). As in BCV, 
there are two methionine residues at the putative first 
and third codons of the protein. The first putative initia¬ 
tion codon is not in favorable context for initiation of 
protein synthesis, whereas the presence of a G residue 
at position +4 of the second putative initiation codon 
would presumably improve the situation {31; Table 1). 
The 9.5-kDa protein contains one large hydrophobic 
domain that comprises more than 50% of the mole¬ 
cule, which is potentially transmembranous (data not 
shown). The UCCAAAC consensus sequence is pres¬ 


ent about 120 nucleotides upstream of the first poten¬ 
tial initiation codon, within the 12.9-kDa protein gene 
(Fig. 1). 

The 11 -residue ORF (nucleotides 34 to 66) predicts a 
peptide of 1.33 kDa, which shows 82% identity (9/11 
amino acids) with the N-terminus of the BCV 4.9-kDa 
nonstructural protein {20). A consensus intergenic se¬ 
quence UCAAAC is found within the S gene (Mounir 
and Talbot, manuscript in preparation). This small ORF 
is followed by a 47-nucleotide stretch that has striking 
resemblance with the 3'-half of the 82-nucleotide HCV- 
OC43 leader sequence {26), as shown by 37 identical 
nucleotides (Fig. 2). 

To identify the subgenomic RNAs that encode the 
structural and nonstructural proteins, we performed 
Northern blot analysis using DNA probes (the localiza¬ 
tion of these probes is schematized on the right of Fig. 
3). The M probe, which encompasses the M protein 
gene plus the first 80 nt of the N protein gene, detected 
nine HCV-OC43-specific RNAs that have been num¬ 
bered 1, 2, 2-1, 3, 4, 5, 5-1, 6, and 7 in order of de¬ 
creasing sizes. The 9.5-kDa probe hybridized to RNA 1 
to 5-1, but not RNAs 6 and 7. The 12.9-kDa probe 
hybridized to RNA 1 to 5, but not RNAs 5-1, 6 and 7. 
The S2 probe, which extends into the region upstream 
of the 12.9-kDa ORF, hybridized to RNA 1 to 4 only (Fig. 
3). These results lead us to propose that RNAs 5 and 
5-1 encode the nonstructural proteins 12.9 and 9.5 
kDa, respectively, a situation similar to that for BCV 
homologues {20) and different from that for MHV, 
where only one transcript (mRNA 5) is utilized for the 
synthesis of both the 13- and 9.6-kDa ns proteins {14, 
32). Since the initiation codon for the BCV 12.7-kDa ns 
protein is in a more favorable context for initiation of 
translation than the HCV-OC43 12.9-kDa ns protein 
{20; G instead of U in position -3), we wanted to test 
the translatability of both ORFs, either independently or 
in tandem. In BCV, in vitro translation experiments us¬ 
ing a synthetic transcript containing both the 12.7- and 
9.5-kDa ORFs in tandem demonstrated that the major¬ 
ity of the protein synthesized was the upstream 12.7- 
kDa protein {20). On the contrary, when the same ex¬ 
periment was performed on a MHV RNA containing 
both ORFs (13 and 9.6 kDa), the downstream ORF (9.5 
kDa) was preferentially synthesized ( 14). When we per¬ 
formed a similar translation experiment with, a syn¬ 
thetic HCV-OC43 RNA containing both ORFs (see 
clone A, Fig. 4), we observed that, like in BCV, the 
upstream 12.9-kDa protein was preferentially synthe¬ 
sized (Fig. 4, lane 2). However, when synthetic RNA 
containing only one of the two ORFs (clones B and C; 
Fig. 4) was in vitro translated, both proteins were effi¬ 
ciently synthesized (Fig. 4, lanes 1 and 3, respectively), 
although small amounts of unexpected products were 
observed. The latter polypeptides are most likely non¬ 
specific products, because all constructs were shown 
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Fig. 1. Nucleotide and predicted amino acid sequence between the S and M genes of HCV-OC43. The leader sequence is doubly underlined. 
The potential A/-giycosylation site (°) is indicated. Asterisks indicate stop codons. Consensus intergenic sequences are underlined. Individual 
amino acids correspond to amino acid differences in the Mebus strain of BCV (20). as compared to the HCV-OC43 strain. 


to be correct by sequencing. Indeed, such non-spe¬ 
cific products were also observed, albeit at lower lev¬ 
els, when no mRNA was added (Fig. 4, lane 4). We also 
showed that the HCV-OC43 9.5-kDa protein is made 
during infection using antibody directed against the 
MHV 9.6-kDa protein (32), which specifically stained 


TABLE 1 


Gene product 

Sequences surrounding 
AUG initiation codon 

intergenic initiation 
sequence of 

downstream gene product 

HE 3 

AAAAUGU 

ACUAAAC 

S* 

AACAUGU 

UCUAAAC 

12.9 kDa ORF 

UUAAUGG 

UCAAAAC 

9.5 kDa ORF 

UAAAUGU 

UCCAAAC 

M c 

AUUAUGA 

UCCAAAC 

N d 

AGGAUGU 

UCUAAAU 

Consensus 6 

^CCAUGG 



3 Zhang et a!., 1992 (28). 
b Mounir and Talbot (manuscipt in preparation). 
c Mounir and Talbot, 1992 (27). 
d Kamahora et a!., 1989 (26). 
e Kozak et a!., 1989(31). 


HCV-OC43-infected HRT-18 cells in an immunofluores¬ 
cence assay (data not shown). 

Recently, it was reported that BCV is closely related 
to HCV-OC43 at both the protein and RNA levels (20, 
26-28). in the present study, we found a major differ¬ 
ence in the human coronavirus, represented by the 
absence of the genes potentially encoding two non- 
structural proteins of 4.9 and 4.8 kDa in BCV (20). No 
attempt has yet been reported on the detection of 
these two small proteins in BCV-infected cells or viri¬ 
ons. Their absence in HCV-OC43 indicates that they 
are not essential for virus replication, at least in HRT-18 
cells. Similarly, the ns2, ns4 and ns5a nonstructural 
proteins of MHV were found not to be essential for 
virus replication in transformed cells (33, 34). The ap¬ 
parent presence of low levels of mRNA 4 in HCV- 
OC43-infected cells, in the absence, most likely, of an 
associated protein product suggests that this mutation 
may be a recent event. However, it remains possible, 
although unlikely, that a putative 11-residue peptide is 
expressed from this mRNA. 

The absence of two putative nonstructural proteins 
in HCV-OC43, if they are found to be expressed in 
BCV-infected cells, as well as amino acid differences 
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BCV GATTGCGAGCGA-TTTGCGTGCGTGCATCCCGCTTCA-CTGATCTCTTGTTAGATCTTTTTATAATCTAAAC- 70 

* * * * * * * * * *********************** *********************** ******** * 

OC43 a —TTGTGAGCGAAGTTGCGTGCGTGCATCCCGCTTCACCTGATCTCTTGTTAGATCTTTTTCTAATCTAATCTAAATTTTAAGG 82 

** *********************** ****** * ** *** 

OC43 b ---CA-CTGATCTCTTGTTAGATCTTTTTGCAATCTA-GCATTTGTTAAAGT 47 

Fig. 2. Comparison of leader sequences between HCV-OC43 (26), BCV (Hoffmann etal., 1991, GenBank Accession No. M62375), and the 47 
nucleotides upstream the 12.9-kDa ORF. Identical nucleotides are indicated with asterisks. a Leader sequence of HCV-OC43 (26). b Sequences 
found upstream of the HCV-OC43 12.9-kDa ns protein gene (doubly underlined in Fig. 1). 


within the S protein (Mounir and Talbot, manuscript in 
preparation), could be involved in the apparently prefer¬ 
ential respiratory tropism of HCV-OC43, which con¬ 
trasts with the presumed preferential enterotropism of 
BCV. Indeed, the S proteins of MHV and TGEV were 
suggested to be important in tissue tropism (35, 36). 
However, replication of BCV in the respiratory tract has 
been reported {28) and OC43-like human enteric coro- 
naviruses have been isolated (37). Thus, it is premature 
to conclude on differential tropism of these two vi¬ 
ruses. 

The sites surrounding the putative protein synthesis 
initiation codon in mRNAs 5 and 5-1, UUA AUG G forthe 
12.9-kDa ORF and UA AAUG U forthe 9.5-kDa ORF (Ta¬ 
ble 1), are not those usually used for initiation of eukary¬ 
otic protein synthesis, whereas the AUG initiation co¬ 
dons for the HE, S, M, and N genes are in the context 
AAAAUGU {28), AACAUGU (Mounir and Talbot, manu¬ 
script in preparation), AUU AUG A (27), and AGG AUG U 
{26), respectively, all of them frequently found as func¬ 
tional initiation sites where A is present at the -3 posi¬ 
tion relative to the A of AUG (3 7). The second potential 
initiation codon forthe 9.5-kDa ORF would be in a more 
favorable context for initiation of protein synthesis, 
given the presence of a G residue at position +4 (Fig. 1 
and Table 1). The actual initiation site cannot be in¬ 
ferred before the N-terminus sequence of the 9.5-kDa 


protein can be determined. Interestingly, the initiation 
codon for the 12.9-kDa protein is in a less favorable 
context that the one for the BCV 12.7-kDa protein, 
which has a G instead of a U at the -3 position {20) and 
is nevertheless preferentially used when in tandem 
with the 9.5-kDa ORF. 

It is noteworthy that we found 47 nucleotides of the 
leader of HCV-OC43 in the sequence between S and M 
genes (Fig. 2). This situation is not predicted by the 
proposed models of leader-primed transcription (8), 
because part of the leader sequence (47 nt of 82 nt) is 
found within the genomic RNA instead of at the 5'-end 
of the transcript. It may be a result of recombination, 
whereby HCV-OC43 lost the sequence coding forthe 
4.9- and 4.8-kDa ns proteins and developed a mecha¬ 
nism for conservation of the ORF encoding the 12.9- 
kDa protein. This sequence does not contain one of 
the novel transcription initiation signals reported re¬ 
cently (38). We are currently investigating the role of 
this sequence in virus infection. 

The functions of the putative f 2.9- and 9,5-kDa pro¬ 
teins in infected cells are not known. These small coro- 
navirus proteins, which have both hydrophobic and 
charged domains, have been suggested to act as a 
membrane-anchoring region for structural proteins 
during virus assembly, or to play a role in membrane 
association of the viral polymerase during replication 


Probe: M 9.5 kDa l2.9kDa S2 

I- I I- I I- I I -I 



Fig. 3. Northern blot analysis of RNA from uninfected cella(U) and infected cells (I). RNAs were revealed with the probes schematized to scale 
on the right (base numbers are from Fig. 1, except for 1501, which corresponds to base 737 in Ref. 27): a 900-bp EcoRI-Scal fragment 
containing the M region and the first 80 nucleotides of the N region, bases 605 through 1501 was designated the M probe; a 327-bp 
Hinc\\-BamY\\ fragment containing bases 416 through 743 was designated the 9.5-kDa probe; a 362-bp EcoR\-Bsm\ fragment containing bases 
1 through 362 was designated the 1 2.9-kDa probe; a 1,8-kb EcoRI-EcoRI fragment containing the S2 region (Mounir and Talbot, manuscript in 
preparation) and bases 1 through 126 was designated the S2 probe. Subgenomic mRNAs are indicated on the left. Total RNA was used for 
hybridization with the M probe, whereas poly(A)-mRNA was used with the other probes. 
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Fig. 4. In vitro translation of transcripts from clones schematized to scale on the right: clone A, lane 2; clone B, lane 1; clone C, lane 3. Lane 4, 
in vitro translation with no mRNA added. The migration of molecular weight markers (LKB, Canlab. Pointe-Claire, Qudbec, Canada) is shown on 
the right and that of the two viral proteins on the left. Plasmid constructs 5'-12.9-kDa ORF-9.5-kDa ORF-3' (clone A) and 5'-12.9-kDa ORF-3' 
(clone B) were created by unidirectional deletion of the 1.5-kb insert in pBluescript II SK(+) using exonuclease III and mung bean nuclease 
(Stratagene, La Jolla, CA). A construct 5’-9.5-kDa ORF-3' (clone C) was prepared by removing from clone A a Ssml-SamHI fragment that begins 
in the polylinker region of the pBluescript II SK(+) vector and ends 86 nucleotides upstream from the 9.5-kDa ORF. Recombinant plasmids were 
linearized with SssHII (clones A and C) or with A/ael (clone B) and transcribed in vitro with T3 RNA polymerase {42}. Approximately 1 ng of in 
wfro-transcribed RNA was translated in a wheat germ cell-free extract (Promega, Fisher Scientific, Montreal, Quebec, Canada) using 1 mCi/ml of 
l-[ 35 S] cysteine (1064 Ci/mmol; Amersham) as the radiolabeled precursor. The conditions for translation were those recommended by the 
manufacturer, except that potassium acetate was optimized at 40 m M. Translation products were analyzed under reducing conditions on a 20% 
SDS-polyacrylamide gel {43) and revealed by fluorography with Enlightning (NEN Dupont, Montreal, Quebec, Canada). 


{ 9, 10, 15, 16, 20, 32). Recently, proteins analogous to 
the HCV-OC43 and BCV 9.5-kDa molecules were 
shown to be present in IBV and TGEV virions and were 
termed smalt membrane (sM) proteins {39, 40). We 
have shown that the HCV-OC43 9.5-kDa ns protein, 
like its BCV and MHV counterparts, is expressed in 
virus-infected cells. It will also most likely be found in 
virions. The conservation of the 12.9-kDa protein also 
suggests that it fulfills an important function in corona- 
virus biology. 

The biological importance of the various nonstruc- 
tural proteins encoded by coronaviruses remains to be 
investigated and some of them may turn out to be 
structural components. So far, only the proteins en¬ 
coded by mRNAs 2, 4, and 5b of MHV {17, 18, 32, 41) 
and mRNAs 5-1 of BCV {20) and HCV-OC43 (this study) 
were shown to be produced in infected cells. Our study 
emphasizes the importance of further work on the bio¬ 
logical functions of coronavirus nonstructural and 
novel structural proteins. 
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